Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation
نویسندگان
چکیده
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre perspective; and (iii) author obfuscation, addressing author masking and obfuscation evaluation. In total, 35 teams participated in all three shared tasks of PAN 2016 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.
منابع مشابه
Overview of the PAN/CLEF 2015 Evaluation Lab
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...
متن کاملClustering by Authorship Within and Across Documents
The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estim...
متن کاملAn Overview of the Traditional Authorship Attribution Subtask
This paper describes the Traditional Authorship Attribution subtask of the PAN/CLEF 2012 workshop. As a followup to our subtask at PAN/CLEF 2011 (Amsterdam), we established a new corpus for analysis for 2012 (Rome). The new corpus differed in several ways from the previous subtask: – Both the number and size of documents were decreased – The documents were taken from a different genre (fiction,...
متن کاملMulti Feature Space Combination for Authorship Clustering
The Author Identification task for PAN 2016 consisted of three different Sub-tasks: authorship clustering, authorship links and author diarization. We developed a machine learning approaches for two of three of these tasks. For the two authorship related tasks we created various sets of feature spaces. The challenge was to combine these feature spaces to enable the machine learning algorithms t...
متن کاملOverview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited
We report on the second large-scale evaluation of style obfuscation approaches in a shared task on author obfuscation, organized at the PAN 2017 lab on digital text forensics. Author obfuscation means to automatically paraphrase a given text such that state-of-the-art authorship verification approaches misjudge a given pair of documents as having been written by “different authors” if in fact t...
متن کامل